JAMIA Open — Latest Matching Preprints

1

Early Detection of Absurdity Signals in Pharmacovigilance: A Machine Learning Ensemble Approach to Identify Rare Adverse Drug Reactions

Dasgupta, R.

2026-02-09 health informatics 10.64898/2026.02.06.26345783 medRxiv

Top 0.1%

18.5%

Show abstract

BackgroundTraditional pharmacovigilance methods based on biostatistical approaches systematically exclude outliers and rare events, potentially missing critical safety signals. These methods fail to detect micro-clusters of adverse events and comorbidity patterns that may indicate serious but low-frequency adverse drug reactions (ADRs). We introduce the concept of absurdity signal detection - the identification of statistically anomalous but clinically significant adverse event patterns that conventional methods dismiss as outliers. MethodsWe developed an ensemble machine learning framework combining five distinct algorithms (Random Forest, Gradient Boosting, XGBoost, Neural Networks, and Support Vector Machines) to analyze FDA Adverse Event Reporting System (FAERS) data. The system employs outlier-inclusive modeling, multi-dimensional cluster detection, and severity-weighted propensity scoring. We validated our approach on Losartan, analyzing 500 adverse event reports to detect absurdity signals that may have been missed by conventional biostatistical surveillance. ResultsOur ensemble approach achieved 75% accuracy in identifying high-risk adverse events, with the best-performing model successfully detecting 15 distinct absurdity signals. The top five identified events were: cough (propensity score 1.525), angioedema (1.298), insomnia (1.290), nausea (1.180), and hyperkalemia (1.114). Notably, our method identified several rare but severe ADRs that would have been excluded as statistical outliers in traditional disproportionality analyses. The ensemble approach demonstrated superior performance compared to individual models, with inter-model agreement providing an additional confidence metric for signal validation. ConclusionsMachine learning-based absurdity signal detection offers a paradigm shift in pharmacovigilance by preserving and analyzing rare adverse events rather than excluding them. This approach has significant implications for patient safety, potentially preventing serious adverse events in vulnerable populations with atypical response profiles. Our methodology is scalable, validated against FDA data sources, and provides a framework for real-time safety monitoring in the $138 billion pharmaceutical industry. Future work will extend this approach to drug-drug interaction detection and personalized risk stratification.

2

Extending the OMOP Common Data Model to Support Observational Peripheral Vascular Disease Research

Leese, P. J.; McIntee, T.; Browder, S. E.; Laivuori, M.; Alabi, O.; McGinigle, K. L.

2026-02-03 health informatics 10.64898/2026.02.01.26345276 medRxiv

Top 0.1%

17.8%

Show abstract

BackgroundPeripheral artery disease (PAD) and chronic limb-threatening ischemia (CLTI) cause substantial morbidity and mortality, yet research progress is limited by fragmented, non-standardized data. The Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) provides a standardized framework for electronic health record (EHR) research but lacks domain-specific detail for peripheral vascular diseases. This study aimed to develop and test a vascular-specific OMOP CDM extension to improve data standardization, enable reproducible real-world analyses, and support precision medicine research in PAD and CLTI. MethodsWe identified patients with PAD, CLTI, or diabetic foot ulcers who sought care within the UNC Health System between April 2014 and July 2024. Standard OMOP tables were supplemented with peripheral vascular laboratory (PVL) data and state death records. Intermediate tables were designed for key clinical domains (e.g., smoking, comorbidities, revascularizations) to enhance reusability. Predictive models for revascularization and mortality were developed using logistic regression with Bayesian weighting and Markov Chain Monte Carlo feature selection. Clinical ApplicationThe revascularization model displayed high performance with and without important vascular variables (AUC = 0.970 and AUC 0.969, respectively), while the mortality model demonstrated moderate accuracy (AUC = 0.656) that improved with inclusion of vascular-specific features (AUC = 0.752). ConclusionsThis vascular OMOP extension represents one of the first specialty-specific frameworks for peripheral vascular research. By extending the OMOP CDM to a vascular domain, this work advances both the technical framework and scientific capability of real-world data research in limb preservation and precision vascular medicine.

3

Improvement in Albuminuria Screening Associated with EHR Decision Support Change

Zafar, W.; Tavares, S.; Hu, Y.; Brubaker, L.; Green, J.; Mehta, S.; Grams, M. E.; Chang, A. R.

2026-02-14 health informatics 10.64898/2026.02.09.26345709 medRxiv

Top 0.1%

14.7%

Show abstract

BackgroundAlbuminuria is associated with increased risk of cardiovascular disease (CVD), heart failure, and progression of chronic kidney disease (CKD). Early detection of albuminuria, done through spot urine albumin creatinine ratio (UACR) testing, enables more accurate risk stratification and timely use of preventative therapies. It remains unacceptably low in the hypertension population. MethodsWe evaluated two EHR-embedded clinical decision support (CDS) strategies at Geisinger Health System in order to increase UACR testing in individuals with hypertension: an OurPractice Advisory (OPA) from Jan 2022 to Aug 2022; and a Health Maintenance Topic (HMT) in the Care Gaps section of Storyboard from Aug 2022 that continues to date. We evaluated UACR rates from 2020 to 2023 in Geisinger primary care and compared to a control group of healthcare systems in the Optum Labs Data Warehouse [OLDW]. Patients were excluded if they had UACR testing in the preceding 3 years, had diabetes or CKD, or were receiving palliative/hospice care. ResultsWe included 58,876 individuals in Geisinger (mean age 59.4 years, 49.6% female) and 1,427,754 in OLDW (61.0 years, 49% female). UACR testing in Geisinger (2.97% in 2020; 2.8% in 2021; 9.7% in 2022; 17.5% in 2023) showed significant increase compared to the control health systems (2.08%, 2.26%, 3.35% and 3.40% respectively). Results were consistent after adjusting for age, sex and race. ConclusionOPA increased UACR testing [~]3-fold whereas the HMT was associated with further improvements ([~]6-fold vs. baseline) among those with hypertension, suggesting an important role for CDS design in closing care gaps.

4

Development and validation of an algorithm to identify front-line clinicians using EHR audit log data

Baratta, L. R.; Wang, J.; Osweiler, B. W.; Lew, D.; Eiden, E.; Kannampallil, T. G.; Lou, S. S.

2026-02-16 health informatics 10.64898/2026.02.13.26346268 medRxiv

Top 0.1%

14.3%

Show abstract

BackgroundInterprofessional teams are central to high quality patient care. However, identifying the clinician primarily responsible for a patient requires labor-intensive methodologies. Although electronic health record (EHR) audit logs offer a scalable alternative, its use for identifying frontline clinicians is underdeveloped. ObjectiveTo develop and validate an algorithm utilizing EHR audit logs to identify the primary frontline clinician per patient day of an encounter and to describe care continuity patterns. MethodThis was a cross-sectional cohort study of adult inpatient medicine encounters at 12 hospitals in a single health system using a shared EHR. Admissions from February 1, 2023-April 30, 2023, with length of stay of at least 3 days and without an intensive care unit admission were included. Four algorithm iterations were designed to identify the attending physician, resident, or advanced practice provider primarily responsible for patient care on each patient-day. Performance of each algorithm was compared with manual chart review on 1,401 patient-days from 246 randomly sampled patient encounters. Accuracy between an algorithm and the chart review standard was compared using McNemars test with Bonferroni adjusted p-values. ResultsThe best performing algorithm correctly identified the primary clinician responsible for patient care on 91% of patient-days (1,268/1,401), outperforming the naive approach using frequency of actions (78% accuracy, 1,098/1,401, p<0.001). Algorithm errors were attributable to misidentified specialty and ambiguity on days with transitions of care or shared responsibilities between clinicians. The best performing algorithm was applied to the entire cohort (5,801 encounters and 34,001 patient-days) where it identified attending physicians, resident physicians, and APPs as the frontline clinician for 26,750 (79%), 3,106 (9%), and 4,145 (12%) of patient days respectively. Each encounter had a median of 1 (IQR 0-2) handoff between frontline clinicians. ConclusionsWe developed a scalable, audit log-based algorithm to determine the front-line clinician with excellent accuracy compared with manual chart review.

5

Community Detection and Patient Experience Analysis in Reddit Conversations on Janus Kinase Inhibitors using Large Language Models

Agboola, T. O.; Akbar, S.; Duruaku, U.; Al-Janabi, A.; Ioannou, A.; Loh, J. H. M.; Murphy, C.; Yiu, Z. Z. N.; Ajao, O.

2026-02-04 health informatics 10.64898/2026.02.02.26345429 medRxiv

Top 0.1%

14.1%

Show abstract

The emergence of Janus kinase (JAK) inhibitors, a relatively new class of medications for autoimmune and inflammatory conditions, has been accompanied by reports of adverse effects observed during clinical trials. However, uncertainty over their safety and efficacy in wider, unselected populations has led to discussion and speculation on social media such as Reddit. Social networks represent a novel, rich source of real-world pharmacovigilance data. They are also an environment where unverified information about these medications may circulate. This paper analyzes Reddit conversations related to JAK inhibitors, applying graph modeling and community detection techniques using Neo4j and the Louvain algorithm. Data from 2011 to 2024 were collected, cleaned, and used to construct a directed graph, incorporating posts, comments, users, and drug mentions as nodes and their interactions as edges. Advanced computational methods, including large language models, were utilized to analyze textual data and identify patient-reported experiences that diverge from current medical consensus. This study systematically maps online discourse and identifies key participants to understand how patient experiences and concerns about JAK inhibitors are shared within communities. The findings show that various subreddits serve as hubs of information in which key influencers are spreading both positive and negative information within the Reddit ecosystem. Highlighting the potential to integrate graph-based approaches, Neo4j, and advanced LLMs in real-time pharmacovigilance, this study presents compelling evidence of the emerging conversations surrounding JAK inhibitors and how they affect public health. Author SummaryPeople often turn to Reddit to share their experiences with medications, including Janus kinase (JAK) inhibitors, which are used to treat autoimmune conditions such as arthritis, eczema, and alopecia areata. These drugs are fairly recent, and some safety concerns have been identified, making discussions about them on the internet a mixture of personal stories, questions, and statements which may not correspond to serious or established medical literature. [60] In this research we analyzed over ten years of Reddit discussions to explore how individuals discuss JAK inhibitors and how both correct and possibly misleading information circulates within groups. We integrated graph-based techniques, which illustrate the connections among users and conversations with AI tools that identify claims at odds, with clinical guidelines. We applied the term "divergent patient experiences" exclusively to comments that contradict regulation or evidence-based sources, while personal accounts and feelings of individuals are not classified as divergent patient experiences. Our findings demonstrate that a very small number of users initiate a large proportion of conversations and discussions tend to revolve around the main health topics. This approach of using social media to monitor public health opinions shows the manner in which it avails information regarding real patient concerns, but it also shows the requirement of expert supervision when using AI to appraise health information being shared online.

6

Clinicians' Rationale for Editing Ambient AI-Drafted Clinical Notes: Persistent Challenges and Implications for Improvement

Guo, Y.; Hu, D.; Yang, Z.; Chow, E.; Tam, S.; Perret, D.; Pandita, D.; Zheng, K.

2026-02-22 health informatics 10.64898/2026.02.20.26346729 medRxiv

Top 0.1%

14.1%

Show abstract

Structured AbstractO_ST_ABSObjectiveC_ST_ABSThe use of ambient AI documentation tools is rapidly growing in US hospitals and clinics. Such tools generate the first draft of clinical notes from scribed patient-provider conversations, which clinicians can then review and edit before signing into electronic health records (EHR). Understanding how and why clinicians make modifications to AI-generated drafts is critical to improving AI design and clinical efficiency, yet it has been under-studied. This study aims to address this gap. Materials and MethodsWe conducted semistructured interviews with 30 clinicians from the University of California, Irvine Health who used a commercial ambient AI tool in routine outpatient care. We invited them to describe how and why they edited AI drafts based on both their personal experience and review of some real-world examples identified from our previous studies. ResultsModifications to AI drafts were primarily made to improve clinical accuracy and specialty-specific precision, reduce medico-legal and liability risk, and meet billing, coding, and documentation standards. Such editing was necessary due to reasons such as transcription errors, speaker attribution mistakes, overconfident statements without evidence, missing key clinical details, and AIs lack of information about the patient context. Conclusion and DiscussionImproving ambient AI documentation will require coordinated effort from vendors, institutions, and clinicians. Key targets include core model reliability (e.g., transcription accuracy), specialty-and encounter-level customization, clinician-level personalization, more effective EHR integration, and institutional support (e.g., training, governance, and standardized review guidance), complemented by clinicians adaptive communication strategies that strengthen human-AI collaboration.

7

A bibliometric review of explainable AI in diabetes risk prediction: Trends, gaps, and knowledge graph opportunities

Van, T. A.

2026-04-20 health informatics 10.64898/2026.04.16.26351069 medRxiv

Top 0.1%

13.8%

Show abstract

BackgroundType 2 diabetes mellitus (T2DM) is a leading global public health challenge. Machine learning (ML) combined with Explainable AI (XAI) is increasingly applied to T2DM risk prediction, but the field lacks a quantitative overview of methodological trends and integration gaps. MethodsWe present a structured synthesis and critical analysis of the XAI literature on T2DM risk prediction, combining (i) quantitative bibliometric analysis of a two-database corpus (N = 2,048 documents from Scopus and PubMed/MEDLINE, deduplicated via a transparent three-tier pipeline) and (ii) an in-depth selective review of 15 highly cited papers. Reporting follows PRISMA 2020, adapted for metadata-based synthesis; analyses include keyword frequency, rule-based thematic clustering, and publication trend analysis. ResultsThe field grew rapidly, from 36 documents (2020) to 866 (2025). SHAP and LIME dominate XAI methods; XGBoost and Random Forest dominate ML models. Critically, KG/GNN terms appeared in only 17 documents ([~]0.83%) compared with 906 for XAI methods, a 53.3:1 disparity. This gap is consistent across both databases, which share 33.2% of their records, ruling out a single-database artifact. The selective review confirmed that none of the 15 highly cited papers combined all three components, ML, XAI, and KG, in T2DM risk prediction. ConclusionsThe XAI for T2DM risk prediction field exhibits a clinical interpretability gap: statistical explanations are rarely linked to structured clinical pathways. We propose a three-layer conceptual framework (Predictive [->] Explainability [->] Knowledge) that integrates KG as a supplementary semantic layer, with potential applications in clinical decision support and population-level screening. The framework does not perform true causal inference but structures explanations around established pathophysiological knowledge. This study contributes a transferable methodology and a quantified research gap to guide future work integrating ML, XAI, and structured medical knowledge.

8

The Associations of Diabetes Mellitus and Obesity on Osteoporosis

Thomas, M. G.; Jayasuriya, A. C.

2026-03-18 orthopedics 10.64898/2026.03.16.26348517 medRxiv

Top 0.1%

10.3%

Show abstract

ObjectivesOsteoporosis is a common and debilitating condition that disproportionately affects older adults, particularly women, leading to increased fracture risk and reduced quality of life. While traditional risk factors such as age, hormonal changes, and lifestyle are well established, the impacts of diabetes and obesity on osteoporosis remain unclear. This study aimed to investigate associations between diabetes, obesity, and osteoporosis diagnosis in Caucasian women aged 64 years and older. Materials and MethodsData on osteoporosis diagnosis, diabetes diagnosis, and body mass index (BMI) were obtained from the publicly available Study of Osteoporotic Fractures (SOF) database. Statistical analyses were conducted using IBM SPSS software. Associations between diabetes, BMI, and osteoporosis were evaluated at two study visits (visit 1 and visit 8). Analysis of variance (ANOVA) and correlation analyses were used to assess relationships among variables. ResultsNo significant association was found between diabetes and osteoporosis at visit 1 (p = 0.966); however, a statistically significant association emerged at visit 8 (p < 0.001). A weak negative correlation between diabetes and osteoporosis was observed at visit 8 (r = -0.068, p < 0.001), indicating that participants with diabetes were slightly less likely to be diagnosed with osteoporosis. BMI category was significantly associated with osteoporosis at both visits (p < 0.001). Post hoc analyses revealed that overweight and obese women had a lower likelihood of osteoporosis than underweight or normal-weight participants. ConclusionsDiabetes showed no consistent association with osteoporosis diagnosis, whereas higher BMI appeared to exert a protective effect against osteoporosis in older women.

9

Nationwide Prediction of Missed and Cancelled Appointments Using Real-World EHR Data

Miran, S. A.; Cheng, Y.; Faselis, C.; Brandt, C.; Vasaitis, S.; Nesbitt, L.; Zanin, L.; Tekle, S.; Ahmed, A.; Nelson, S. J.; Zeng-Treitler, Q.

2026-04-13 health informatics 10.64898/2026.04.08.26349942 medRxiv

Top 0.1%

10.3%

Show abstract

ObjectivesTo develop and evaluate predictive models for unused outpatient appointments (missed or cancelled) using a large national electronic health record (EHR) repository in the United States. DesignRetrospective observational study using machine learning and statistical modeling. SettingA U.S. national electronic health record repository (Cerner Real World Database) covering healthcare encounters from 2010 to 2025. ParticipantsAdult patients aged [≥]18 years with routine outpatient encounters recorded in the database. One outpatient appointment with a known status was randomly selected per patient, resulting in a final analytic sample of 5,699,861 encounters. Primary and Secondary Outcome MeasuresThe primary outcome was whether the index outpatient appointment was attended or unused (missed or cancelled). Model performance was evaluated using area under the receiver operating characteristic curve (AUC), sensitivity, and specificity. MethodsPredictors included patient characteristics (demographics and insurance type), appointment characteristics (day, time, season, and urbanicity), prior cancellation rate, and time gap between the index appointment and the previous visit. We compared the predictive performance of two machine learning models (random forest classifier and extreme gradient boosting (XGBoost)) with logistic regression. An explainable AI analysis of feature impact was performed on the final XGBoost model. ResultsAmong 5,699,861 outpatient encounters, 3,650,715 (64.0%) were attended and 2,049,146 (36.0%) were unused. XGBoost achieved the best predictive performance on the test dataset (AUC = 0.95), followed by random forest (AUC = 0.92) and logistic regression (AUC = 0.89). Feature impact score analysis revealed highly non-linear associations between predictors and the risk of unused appointments at the individual level. ConclusionsUnused outpatient appointments can be accurately predicted using routinely available EHR data. Integrating predictive models into scheduling workflows may improve healthcare efficiency and optimize appointment management. Article SummaryStrengths and limitations of this study O_LIThis study used one of the largest national electronic health record datasets to develop predictive models for unused outpatient appointments. C_LIO_LIMultiple modeling approaches, including logistic regression and machine learning methods (random forest and XGBoost), were compared to evaluate predictive performance. C_LIO_LIAn explainable artificial intelligence method was applied to quantify feature impact and improve model interpretability. C_LIO_LIThe retrospective design and reliance on routinely collected EHR data may introduce data quality limitations and unmeasured confounding. C_LIO_LIThe database did not distinguish clearly between cancelled appointments and no-shows. C_LI

10

Gender-Specific Osteoporosis Risk Prediction Using Longitudinal Clinical Data and Machine Learning

Tripathy, S.; Saripalli, L.; Berry, K.; Jayasuriya, A. C.; Kaur, D.; Syed, F.

2026-02-17 orthopedics 10.64898/2026.02.13.26346244 medRxiv

Top 0.1%

10.0%

Show abstract

Osteoporosis is a silent yet debilitating disease that often remains undetected until fractures occur. While early prediction is crucial, most studies combine male and female datasets to train a single model, introducing bias since osteoporosis risk and progression differ by gender. This study aims to develop gender-specific machine learning models that leverage longitudinal data to predict osteoporosis risk, providing tailored insights for men and women. Data were obtained from two large longitudinal cohorts: the Study of Osteoporotic Fractures (SOF) for women and the Osteoporotic Fractures in Men Study (MrOS) for men. Multiple ML algorithms were trained and evaluated for each sex, with model performance assessed using the area under the receiver operating characteristic curve (AUC-ROC). Among the tested models, the XGBoost model demonstrated the best performance for women, achieving an AUC-ROC of 0.93 using SOF data. For men, the Random Forest model achieved an AUC-ROC of 0.89 using MrOS data. Feature importance analysis identified sex-specific osteoporosis risk factors, underscoring the need for tailored prediction and management. By revealing male and female risk factors and reducing bias from combined datasets, the work advances personalized care and supports earlier, effective clinical intervention to prevent fractures and improve health outcomes.

11

Shared Strides: Community-based, high-throughput biomechanics data collection in knee osteoarthritis

Qualter, J. M.; McCloskey, R. C.; Stofer, K. A.; Qiu, P.; Tian, Z.; Vincent, H. K.; Costello, K. E.

2026-03-25 orthopedics 10.64898/2026.03.23.26349064 medRxiv

Top 0.1%

9.9%

Show abstract

Objective: This analysis assessed the acceptability and recruitment implications of a high-throughput, community-based biomechanics protocol among individuals with knee osteoarthritis (OA). Design: During the Shared Strides Study, high-throughput markerless biomechanics assessment was conducted at community sites to help facilitate research engagement in the OA population. In this cross-sectional study, biomechanics data during a set of activities of daily living (ADLs) and questionnaire data were collected. Adults aged 40 years or older with knee OA participated at one of four sites across Gainesville, FL--two on-campus and two community-based. Eligible individuals were either screened over the phone and scheduled for a specific date and time or screened on site for potential same-day participation. Participant acceptability of the community-based biomechanics data collection approach was assessed using a 15-item custom questionnaire. Recruitment characteristics and participant preferences were compared across sites. Results: The high-throughput community-based data collection approach was well received. Compared with on-campus sites, community-based sites had higher engagement from walk-in participants and new research participants (40% of the sample). Familiarity with, and distance to, a data collection site were important factors in research engagement in this population. No differences in demographic characteristics existed between sites (p > 0.05), but recruitment resulted in a large sample size (n = 85) likely representative of the communities surrounding the selected sites. Conclusions: Integrating markerless motion capture with a community-based research approach may enhance the participant experience and facilitate larger, more heterogeneous sample sizes, ultimately reducing bias and homogeneity in current OA biomechanics research.

12

Randomized Trial Protocol: Epic Generative AI Chart Summarization Tool to Reduce Ambulatory Provider Cognitive Task Load

Chin, A. T.; Zhu, N.; Kingsley, T. C.; Mynampati, P.; Phipps, Y.; Romanov, A.; Vangala, S.; Weng, M.; Wisk, L. E.; Woo, H.; Mafi, J. N.; Lukac, P. J.

2026-02-22 health informatics 10.64898/2026.02.20.26346503 medRxiv

Top 0.1%

8.8%

Show abstract

BackgroundEHR documentation and chart review contribute to clinician workload and burnout. To alleviate pre-charting burden, Epic has released a new generative AI chart summarizer tool, which has become widely adopted; however, its impact has not been examined in randomized trials. ObjectiveTo evaluate whether access to an Epic generative AI chart summarization tool reduces cognitive task load among ambulatory providers compared with usual care. MethodsTwo-arm, parallel-group randomized controlled trial among ambulatory clinicians across multiple specialties. Clinicians will be randomized 1:1 to tool access versus usual care for 90 days. The primary outcome is change in a 4-item physician task load (PTL) adapted for the pre-charting task. Exploratory outcomes include EHR-derived time metrics (Caboodle and Signal), professional fulfillment/burnout (PFI), usability (SUS), clinician satisfaction, aggregated patient experience item from CG-CAHPS, and reported safety related metrics. Ethics and DisseminationAnalyses will use clinician-level survey responses and aggregated EHR metrics; no patient-level protected health information will be included in the analytic dataset. Results will be disseminated via preprint and peer-reviewed publication. Article summary - Strengths and limitations of this studyO_LIThis study is a 3-month pragmatic randomized controlled trial evaluating a native EHR-embedded generative AI tool that summarizes prior clinical notes for ambulatory encounters. C_LIO_LIThe primary outcome uses a validated cognitive task load instrument adapted specifically for pre-charting activities. C_LIO_LIExploratory outcomes include objective EHR-derived time metrics, validated psychometric measures of burnout and professional fulfillment, and clinician-reported survey measures assessing perceived usefulness of the tool. C_LIO_LIThe trial is single-centered, which may limit generalizabilty, and the intervention is optional-use and unblinded, which may attenuate observed effects and introduce performance bias. C_LI

13

Decision Curve Analysis for Evaluating Machine Learning Models for Next-Day Transfer Out of ICU

Pozo, M.; Pape, A.; Locke, B.; Pettine, W. W.

2026-04-21 health informatics 10.64898/2026.04.19.26351213 medRxiv

Top 0.1%

8.8%

Show abstract

Timely identification of intensive care unit (ICU) patients likely to exit the unit can support anticipatory workflows such as chart review, eligibility screening, and patient outreach prior to transfer. Most ICU discharge prediction studies report discrimination and calibration, but these metrics do not quantify the decision consequences of acting on predictions. Using adult ICU admissions from MIMIC-IV, we represented each ICU stay as a sequence of daily clinical summaries and trained logistic regression, random forest, and XGBoost models to predict next day ICU transfer. Models achieved ROC AUC of 0.80-0.84 with differing calibration. We evaluated decision utility using decision curve analysis (DCA), where positive predictions trigger proactive review. Across thresholds, model guided strategies outperformed review-all, review-none, and a simple clinical rule. To translate net benefit into implementable operations, we modeled a clinical trial recruitment workflow with an 8 hour daily time constraint, incorporating chart review and consent effort. At a feasible operating threshold (0.23), the model flagged [~]23 charts/day and yielded [~]1.23 enrollments/day under conservative eligibility and consent assumptions. These results demonstrate that DCA provides a transparent framework for determining when ICU transfer predictions are worth using and how thresholds should be selected to align with real world workflow constraints. Data and Code AvailabilityThis research has been conducted using data from MIMIC-IV. Researchers can request access via PhysioNet. Implementation code is available upon request.

14

Unseen Insights: An AI-Powered Exploration of Secure Patient Messages in Ophthalmology

Kim, J. Y.; Fazal, Z. Z.; Wang, S. Y.; Chang, R. T.; Linos, E.; Sepah, Y.

2026-02-05 health informatics 10.64898/2026.02.03.26345491 medRxiv

Top 0.1%

8.5%

Show abstract

ObjectiveTo characterize the clinical and administrative concerns communicated through secure ophthalmology messaging and to assess differences in message content across patient sociodemographic groups. DesignCross-sectional study of de-identified, patient-initiated secure messages sent between June 2014 and July 2024. ParticipantsPatients with ophthalmic conditions who initiated secure electronic health record portal messages. Of 48 516 extracted message threads, 30 390 patient medical advice request messages from 4 817 unique patients were included after exclusion of questionnaires, courtesy messages, and clinician responses. Participants were 55.5% female, 56.9% aged 50 years or older, 48.7% White, and 85.7% non-Hispanic. MethodsNatural language processing and large language model-assisted topic classification were used to categorize message content. Differences in message frequency by demographic subgroup were assessed using 2-proportion z tests. Main Outcomes and MeasuresDistribution of message topics and frequency of clinical concerns stratified by age, sex, race, ethnicity, and marital status. ResultsNearly half of all messages addressed administrative issues, including scheduling, medication refills, and insurance. Among clinical concerns, vision disturbances (20.8%), glaucoma-related symptoms (8.7%), imaging or tumor-related questions (7.5%), and postoperative concerns (7.4%) were most common. Message content differed significantly by demographic characteristics. Non-White patients more frequently raised issues related to pharmacy refills, insurance, glaucoma, and disability documentation, whereas White patients more often reported surgical concerns. Older patients more frequently messaged about glaucoma, surgery, and tumor-related issues, while female patients more often reported complications and swelling or infection. ConclusionsSecure patient messages frequently include clinically relevant symptoms with potential triage implications and demonstrate demographic differences in care-seeking behavior. Systematic analysis of message content may support safer triage, improved workflow efficiency, and more equitable delivery of ophthalmic care.

15

MIMIC-IV-Phenotype-Atlas (MIPA) : A Publicly Available Dataset for EHR Phenotyping

Yamga, E.; Goudrar, R.; Despres, P.

2026-04-24 health informatics 10.64898/2026.04.16.26350888 medRxiv

Top 0.1%

8.4%

Show abstract

Introduction Secondary use of electronic health records (EHRs) often requires transforming raw clinical information into research-grade data. A central step in this process is EHR phenotyping - the identification of patient cohorts defined by specific medical conditions. Although numerous approaches exist, from ICD-based heuristics to supervised learning and large language models (LLMs), the field lacks standardized benchmark datasets, limiting reproducibility and hindering fair comparison across methods. Methods We developed the MIMIC-IV Phenotype Atlas (MIPA) dataset, an adaptation of MIMIC-IV that provides expert-annotated discharge summaries across 16 phenotypes of varying prevalence and complexity. Two independent clinicians reviewed and labeled the discharge summaries, resolving disagreements by consensus. In parallel, we implemented a processing pipeline that extracts multimodal EHR features and generates training, validation, and testing datasets for supervised phenotyping. To illustrate MIPA's utility, we benchmarked four phenotyping methods : ICD-based classifiers, keyword-driven Term Frequency-Inverse Document Frequency (TF-IDF) classifiers, supervised machine learning (ML) models, and LLMs on the task. Results The final MIPA corpus consists of 1,388 expert-annotated discharge summaries. Annotation reliability was high (mean document-level kappa = 0.805, mean label-level kappa = 0.771), with 91% of disagreements resolved through consensus review. MIPA provides high-quality phenotype labels paired with structured EHR features and predefined train/validation/test splits for each phenotype. In the benchmarking case study, LLMs achieved the highest F1 scores in 13 of 16 phenotypes, particularly for conditions requiring contextual interpretation of clinical narrative, while supervised ML offered moderate improvements over rule-based baselines. Conclusion MIPA is the first publicly available benchmark dataset dedicated to EHR phenotyping, combining expert-curated annotations, broad phenotype coverage, and a reproducible processing pipeline. By enabling standardized comparison across ICD-based heuristics, ML models, and LLMs, MIPA provides a durable reference resource to advance methodological development in automated phenotyping.

16

From Carb Counting to Diagnosis: Real World Patient Uses and Attitudes Toward Large Language Models in Diabetes Management

Nkweteyim, R. N.; Shet, V. G.; Iregbu, S.; He, L.

2026-03-19 health informatics 10.64898/2026.03.10.26348079 medRxiv

Top 0.1%

8.2%

Show abstract

Managing diabetes-related conditions is time-intensive and cognitively demanding for patients and caregivers, requiring ongoing glucose monitoring, dietary regulation, physical activity planning, and continuous lifestyle adaptation. With the emergence of large language models (LLMs), patients have increasingly turned to these tools for information, guidance, and support. However, there is limited empirical understanding of which diabetes-related medical tasks patients delegate to LLMs and what their experiences are. To address this gap, we combined qualitative thematic analysis with LLM-assisted analysis to examine patient attitudes and real-world use cases in using LLMs for diabetes-related tasks. Our analysis identified diverse application areas, ranging from clinical interpretation to nutrition and diet support, and disease management amongst others. LLMs functioned not only as information sources, but as interpretive, analytical, decision-support, emotional, and logistical aids supporting patients self-management. Last, we discuss implications for integrating LLMs into patients self-management support ecosystems and identify areas that require support and safeguards.

17

Electronic health record implementation: how to reduce the possible negative impacts

Calderon, P. F.; Wolosker, N.

2026-03-25 health informatics 10.64898/2026.03.24.26347438 medRxiv

Top 0.1%

8.2%

Show abstract

Objective: Develop a methodology to implement action plans that mitigate the negative impacts associated with the EHR implementation project and evaluate their effectiveness in reducing these issues. Methods: The research involved the development of mitigation plans for the potential negative impacts of implementing an electronic health record system, ensuring their execution and subsequently analyzing the effectiveness of the method. Results: Findings confirmed that 19.3% of 264 identified impacts were resolved through 52 plans before Go Live. During Go Live, the remaining 213 impacts were addressed through 337 plans. Six months later, 190 impacts were confirmed, and the plans were considered effective or partially effective in 80.5% of cases. Conclusions: Effective governance, a multidisciplinary methodology, and well-planned and executed actions increase the likelihood of success for health technology projects.

18

Embedded point of care stratified block randomization: demonstration of the Point of Care Randomization (POCR) engine with an electronic health record pragmatic clinical trial

Sarkisian, C.; Ibrahim, K.; Vangala, S.; Villaflores, C. W.; Cheng, E. M.; Turner, W.; Leuchter, R. K.; Machado, A.; Tabar, J.; Verdeflor, J. A.; Purvis, J.; Goncharova, A.; Pletcher, M. J.

2026-01-28 health informatics 10.64898/2026.01.26.26344847 medRxiv

Top 0.1%

7.1%

Show abstract

We describe a new custom feature within our Epic Systems electronic health record (EHR) that automates stratified randomization at the point-of-care or order. As a demonstration use-case, we conducted a randomized trial of a provider-facing alert for short-interval HbA1c orders. Over 3 months the alert dramatically reduced repeat orders. This transportable clinical informatics application transforms health systems ability to conduct pragmatic clinical trials and deliver clinical care within the EHR.

19

Virtual Pooling Enables Accurate, End-to-End Multi-Institutional Study Execution and Causal Inference Without Centralized Data Sharing

Ahmad, I.; Ayati, A.; Liu, K.; Ko, S.; Bonine, N.; Tabano, D.; Malik, N.; Lyu, T.; Zheng, K.; Rudrapatna, V. A.; Gupta, T.

2026-03-26 health informatics 10.64898/2026.03.24.26349123 medRxiv

Top 0.1%

6.9%

Show abstract

Background: Multicenter retrospective studies often rely on bringing patient-level data together into a single repository, introducing substantial regulatory and operational barriers. Federated analytics provides a privacy-preserving alternative; however, existing implementations are complex to use, require extensive manual effort for data cleaning, preprocessing, and harmonization, and produce approximate rather than ground-truth results for many biostatistical methods. Virtual Pooling (VP) is a recently developed multicenter study execution platform designed to overcome these limitations. In this study, we evaluate whether VP can replicate a published multicenter retrospective study end-to-end---including data preprocessing, regression analysis, and causal inference---without centralized data aggregation. Methods: We deployed VP at the University of California, San Francisco (UCSF) and the University of California, Irvine (UCI) and attempted to replicate a published study of diabetic eye disease screening practices (UCSF N = 2,592; UCI N = 5,642). VP supported all phases of this two-center study, including data cleaning, harmonization, feature engineering, imputation, propensity score estimation, patient matching, and model estimation, all conducted through a single interface without manual coordination between centers. We verified preprocessing correctness and compared descriptive statistics and causal effect estimates with those from the original study, which relied on data transfers across the centers. We also measured the latency overhead introduced by VP. Results: VP was deployed without hospital infrastructure changes, new or non-standard governance agreements, or dedicated IT support. All preprocessing steps executed correctly, with individual preprocessing operations and descriptive statistics completing in under 1 second, logistic regression in under 10 seconds, and propensity score matching in under 30 seconds. Descriptive statistics for all 30 baseline covariates were numerically identical to the original study. Univariate regression results identifying predictors of completed screening were also identical, with recent eye clinic referral (OR = 56.7; 95% CI: 42.1-76.4) and history of eye disease (OR = 6.4; 95% CI: 5.6-7.4) as the strongest predictors. VP also reproduced pooled causal estimates of automated referrals, showing an increase in screening completion from 21% to 36% at UCSF and from 13% to 34% at UCI. Conclusion: VP enables accurate, end-to-end multicenter clinical studies without centralized data sharing. By providing a single interface that supports the full analytical workflow, from uncleaned and unharmonized data through statistical results, and by exactly reproducing pooled results, VP eliminates manual coordination and data transfers across centers. These findings validate its practical potential to transform multicenter retrospective studies, particularly in contexts where data sharing is time-consuming, bureaucratic, or restricted.

20

ChatGPT with Mixed-Integer Linear Programming for Precision Nutrition Recommendations

Alkeyeva, R.; Nagiyev, I.; Kim, D.; Nurmanova, B.; Omarova, Z.; Varol, H. A.; Chan, M.-Y.

2026-02-17 health informatics 10.64898/2026.02.14.26346312 medRxiv

Top 0.2%

6.7%

Show abstract

BackgroundThe growing interest in applying artificial intelligence in personalized nutrition is challenged by the complex nature of dietary advice that must balance health, economic, and personal factors. Though automated solutions using either Linear Programming (LP) or Large Language Models (LLMs) already exist, they have significant drawbacks. LP often lacks personalization, whereas LLMs can be unreliable for precise calculations. ObjectivesTo develop and assess a model that integrates a Mixed Integer Linear Programming (MILP) solver with an LLM to generate personalized meal plans and compare it with standalone LLM and MILP models. MethodsThe proposed hybrid MILP+LLM model first uses an LLM (GPT-4o) to filter a unified food dataset (n=297), which combines regional Central Asian and global food items, according to the users profile. The filtered list of food items is then received by a MILP solver which identifies the set of top 10 optimal solutions. Finally, given this set of solutions, LLM chooses the most appropriate meal plan. The model was evaluated using five synthesized, clinically complex patient profiles sourced from Adilmetova et al. [4]. The performance of this hybrid model was compared against standalone MILP and LLM using 5-point Likert scale with Kruskal-Wallis and post hoc Dunns tests for Nutrient Accuracy, Personalization, Practicality, and Variety. ResultsFindings demonstrated that the proposed MILP+LLM model reached balanced performance achieving scores of more than 3.6 points in all criteria, with high scores in Nutrient Accuracy (3.96), Personalization (3.81), and Practicality (3.99). The standalone LLM model performed the weakest in all criteria, with statistically significant lower scores compared to the other two methods. The standalone MILP model performed best in Nutrient Accuracy (4.93) and in Variety (4.10) but lagged behind the MILP+LLM model in Practicality and Personalization. Kruskal-Wallis and Dunns tests showed MILP and MILP+LLM outperformed LLM across all criteria. MILP was more accurate (p<0.0001), while MILP+LLM model was more practical (p=0.021). ConclusionsThe findings suggest that integrating the LLM with the MILP solver creates a model that combines qualitative personalization with quantitative precision. This model produces comprehensive, reliable meal plans, addressing the limitations of using either model alone.